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ABSTRACT 

The purpose of this report is to discuss and compare 
two mathematical models for predicting student enrollments at the 
University of California. One has been proposed in the scientific 
literature and the second has been used by the state of California 
since 1963 to forecast student enrollments. The specific problems 
addressed in this report are the prediction of gross enrollments, 
i.e., freshmen, sophomores, etc., for a particular campus of the 
University as a whole. Although the experimental data is restricted 
to undergraduates, the discussion and conclusions are probably 
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•MODELS FOR PREDICTING GR^SS ENROLLMENTS 
AT THE UNIVERSITY OF CALIFORNIA 



I. INTRODUCTION 
1 . Background 

The purpose of this report is to discuss and compare two mathematica 
models for predicting student enrollments at the University of California 
One has already been proposed in the scientific literature and w will 
refer to it as the Gani -Young-Almond model (GYA), while the second has 
been used by the State of California since 1963 to forecast student 
enrollments; we will refer to the latter as the Grade Progression Ratio 
method (GPR). 

Enrollment forecasts are required for different purposes • At the 
departmental level, for example, they are used to predict faculty work- 
loads and make faculty, classroom and advising assignments. At the 
statewide level forecasts are used for overall budget and planning 
purposes. In this report we are primarily concerned with forecasts of 
the latter type in the absence of quota restrictions on total enrollments 
The effect of such quotas will be studied at a later time. 

Foi-ecasts differ not only in the degree of fineness to which they 
predict various categories of students, but they often refer to different 
periods of time. Predictions of gross enrollments may refer to an upper 
division enrollment during an academic year, while the forecasts of 
departmental majors may only be useful if they refer to class enrollments 
at the beginning of each quarter. Besides the usual statistical question 



of the reliability of any forecast, an item of major importance is the 
degree to which numbers obtained by angregatinq grades and time oeriods 
in one model is consistent with numbers obtained from a separate model 
that predicts qross enrollinent figures in an unaqgreqated form. 

A second item of importance is the clear distinction between the 
variables being predicted and those identified as policy variables. As 
we will point out in later sections of this report, this distinction is 
particularly important in periods where enrollment quotas are imposed 
and one attempts to find admission, redirection and student reclassifi- 
cation policies that maintain these quotas or other restrictions inoosed 
by a university administration. 

The specific problems that we address in this report are the 
prediction of qross enrollments, i.e., freshmen, sophomores, etc., for 
a particular campus or the University as a whole. Although we restrict 
our experimental data to undergraduates, the discussion and conclusions 
are probably appropriate to graduate levels as well. 

Figure 1 is a plot of the actual numbers and forecasts of Berkeley 
campus enrollments in the period 1963-1957. The solid line refers to 
actual enrollments, the dashed lines to forecasts made before the 
beginning of Fall 1963 and Fall 1964 by the Department of Finance of the 
State of California. 



1 1 . FNROLLM.FNT, rnRF CASTS 
I . I wo M.i llK.'iii.il. ic.i I M()d((K 

Denote by X.(t) the number of enrolled students in qrade i at the 
beninning of time period t. The subscript index might, for example, 
refer to sophomores, and the parameter t might refer to the beoinninn of 
the fall quarter 1967. Denote by Y.. (t) the new admissions to arade i 

i 

during the t time period. It is the purpose of enrollment forecast 
models to make a prediction or esti/jate cf X.(t+1) on the ba<->is of cer- 
tain historical infonnation and past trends on enrollment and admission 
data. 

One model that has evoked considerable interest assumps that tlio 
fraction which leave qrade i and qo to grade j i-; a fraction p . . such 
that 

m 

X.(t+1) = Z X (t) p + Y.(t+1) (1) 
J i=1 ' 'J J 

m 

where it is understood that for some i, Z p.- < 1 The p..'s may 

j=l 

themselves be time dependent. 

Gani (1963) used such a model for predicting gross enrollments in 
the Australian university system. His statistical data seems to 
indicate that it is reasonable to assume that transitions between 
nrades have a fixed probability over time for time periods of the order 
of five years. In 1965 Gani adopted a revised model for use at 
Michigan State University which also took into account (i) the number 



of credits completed by the student, and (ii) the possibility of trans- 
fers between majors.; 

The GYA model has the important feature that c^.-.tributions to the 
enrollments in one qrade are identified by their oriqins (orior grades, 
return to the same grade, new admissions, etc.) and are added to aive a 
total enrollment figure. Secondly, it has the appealing feature that 
the fractions p.^ can be interpreted as transition probabilities and 
thus allows one to adopt useful results from the theory of Markov Chains 
even though the process itself may not be Markovian. Th^vdly, the 
conditional short-term forecasts given today's enrollments have an 
intuitively correct structure, and finally, the method has some experi- 
mental evidence to support it. 

A second model that has been used by the State of California 

Department of Finance for predicting gross campus enrollments at the 

University of California is called the Grade Progression Ratio methoi.'. 

Although there are no published reports to document its mathematical^ 

structure, the method is based on defining progression ratios a-j-j and 
t h 

^j,j+l 9r3i<ie and then using these ratios to predict enroll- 

ments Zj(t) for future periods by means of the system of first-o^-der 
difference equations, 

Z^(t+1) = a^^ Z^{t) + Y^{t) 

(2) 

Vl(*^^) = ^j,j^l Zj(t).Y.^^{t.l) 

^11 ^ ^11 fraction of freshmen who return to that grade in the 

next period. 
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The ratio 3- has the interpretation that it is the fraction of 
continuinq students in f;r<ido j + 1 (total minus now admissions) roldtivp 
to total enrollments of a lower qrade at the beqinninc) of a previous 
period. In many of the applications that we have seen these coeffi- 
cients also vary with time. 

Throughout the remainder of this report we will make the further 
assumption that student flows betv/een grades are only of three types: 
(i) to the same grade, (ii) to the next higher qrade, or (iii) by 
departure from the university system. In this case Equation (1) can 
be specialized to yield the result 

V,(t.l)=Xj(t)p,_j„.X,,,(t)pj^,_.,, .V.,,(t.l) (3) 

In other words, enrol lees in grade (j+1) at the beginning of time 
period (t+1) come from grade j or grade j+1 in the previous oeriod or 
represent new admissions. In the absence of quota restrictions on new 
admissions or total class size. Equation (3) is assumed to represent 
the underlying stochastic process and the problem is reduced to one of 

(a) estimating p. . and p. terms 

J » J J 5 J I 

(b) estimating Y.(t) terms 

(c) recursively computinn forecasts and estimates of fluctuations 
from Equation (3) 

(d) establishing time periods which are natural to the forecastinq 
process 
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<in(l 

(e) nwkim experii'iental comparisons of forecist and real data, 
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2. One-Period Forecasts 

In this section we consider the distribution of returning and nev/ 

students cjiven today's enrollments. Given the value X.{t) for the 

3 

enrollments in grade j at time t, the number that return to the sane 

qrade, that advance to the next higher grade or that drop out in the 

succeeding time period are multinomial ly distributed. Tc ;t.ute, 

we denote by X. . the random number that do not advance, by X. the 
J > J J » J**" I 

number that advance from grade j to j+1 in one time period and by X . p. 
the number that drop out. The probability that there are x returninq 
to grade j, y advancing and z leaving is 



x!y!z! ^Pj ^j^ ^'^j^ 



(4) 



X + y + z = n 



where, to simplify notation, we have deleted the time parameter and 

substituted p. for p. ., q. for p. and r. for the drop-ojt 
J J > J J J » J''" I J 

probability, 1 - p.^. - p^^..^. 

The probability distribution for the total number of continuing 
students, i.e., these remaining in grade plus those advanced from a 
lov/er grade, given the actual enrollments in the previous period is 
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y n! 

^■1q i!{ni - i)! (k - i)!{n - k - i)! (5) 



It follows from (5) or direct expectation arguments that the conditional 
one-period expectations are 

E[Xj^l I m. n] = mqj + np^^^ + E[Y.^^] (6) 

where, once again, the time parameter is suppressed. With the further 
assumption that Y^-'s are statistically independent of previous X.'s, the 
conditional variance of enrollments in the next period is 

Var [Xj.^^ I m, n] = mq^d - q . ) + np.^^(l - p .^^ ) + Var [Y.^^] (7) 

given m students in grade j and n students in grade 

The variance to mean ratio for one-period forecasts of continuinq 
students given that there are m students in grade j, n in grade j+1 



IS 



1-Max (q,,p.,,)<"^^j^^-^-i^^"^.i-l^^-^i^Tl 

"^^j'^PjM fs) 

^ <1 - Min{q.,p.^^) 

with the GYA model. This variance to mean ratio nay be reduced or 
increased by new admissions and transfer of students as is indicated 
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by tho ratio, 

■nqjd -ni)^np.^^(l - p ..^ ) . Var [Y-^^] 

for next period enrollments. When new admissions can be forecast 
exactly, Var [Yj^^] = 0; the denominator of (9) is larqer than (8) and 
the variance to mean ratio of enrollments is considerably smaller than 
the corresponding figure for continuing students. If the {Y.} process 
is characterized by a sequence of independent Poisson variables, then 
the ratio of (9) is larger than the ratio in (8). 

Consider, for example, the case of sophomore enrollments in the 
year 1961 at the Berkeley campus. When we assume that = p^2 = 0.58, 
P2 " P22 " 0.23, m = 3843, n = 3445, the variance to mean ratio for 
continuing students in 0.51. The number of new sophomore admissions in 
1961 is 751 students. If this number were known with certainty, then 
the variance to mean ratio of the one-period enrollment forecast is 
reduced from 0.51 to 0.41. If, on the other hand, the new sophomore 
admissions for fall 1960 are Poisson with Var [Y2] = ECY^] = 751, then 
the variance to mean ratio for the one-period forecast increases to the 
value of 0.61. The usual method for reporting forecasts is to give a 
figure for the mean plus or minus two standard deviations. In the first 
case we would predict a figure of 3772 ± 79 students enrolled in the 
sophomore class of 1961 while in the latter case we would obtain 
3772 ± 96 students. 



Since we are unable to find published accounts for the GPR model, 
it is difficult to quess what tho underlying stochastic process mi(iht 
be for continuing students. It may be reasonable to assume that each 
undergraduate at grade j ^ 2 leads to k = 0, 1 , 2, 3, • . . continuina 
undergraduates at the next higher grade with probability distribution 
having mean and variance 

ii. = >: kp. ; a.^ = Z (k - u)^ p. , (10) 
J K ,1 ^ K 

Our data seems to suggest that for the transitions from grade j to 
qrade j+1 

Using conditional expectation arguments and the assumption that each 
student acts independently of all others, we obtain 

E[Yj+l(t+l) I Yj(t) = m] = ma.^j^^ (11) 

and 

m p ^ 

Var [Yj^^(t+1) I Yj(t) = m] = 2 a / = ma/ . (12) 

It is important to notice that with such a model the one-period expecta- 
tions are unaffected by the number of enrollments, say n, in grade j+1 
at time period t. Furthermore, the variance to mean ratio for continuina 
students is independent of m as distinct from the results of (8) which 
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ilciu'nd on hotli mi ,uu\ m. Tlicso ,\ro -.oiiio of the ro.isoris why ono iiiM|ht. he 
toiii|)t(>d to roly more he.wily on the C.YA than on the GPR model . 
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3. Admissions, rnrol Iments and Lonij T_enii Forecasts 

If tquation (1) represents the underlyinq stochastic process, it is 
possible to recursively compute expectations and variances of enrollments 
in future periods. If we write (1) in matrix form, 



X{t+1) = P X{t) + Y{t+1) 



(13) 



we compute recursively on t to obtain 



X(t+1) = P^"^^ X(0) + Z pJ Y{t-j+l) 

j=0 



(14) 



It is well known that elements of higher powers of P decrease geometri- 
cally with t; hence the initial enrollments represented by X{0) in the 
matrix equation of .(>4) have vanishingly small effect on distant enroll- 
ment forecasts. In the case of a university system and where few 
students jump grades, the typical lifetime of an undergraduate student 
is of the order of four or five years; thus the major contribution to 
X{t+1) is due to new admissions in one, two or three years just prior to 
the forecast date. Notice, for example, that the second power of P is 



p2 = 



Pl2(Pn+P22) P22' 



0 
0 



P12P23 
0 



P23(P22^P33) P33' 



0 
0 
0 



P23P34 



P34^P33^P44) P44' 
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Since the maqnitude of tho typical qrade advance probability at the 
University of California lies between 1/2 and 3/5 while the return to 
grade probability is less than or equal to 1/10, terms on the diaqonal 
of powers of P tend to become small rapidly in comparison with non-zero 
terms below the diagonal and the advance of students through grades is 
rapid in comparison to a typical management hierarchy where the diagonal 
terms tend to be much larger. 

The calculation of expected enrollments at the beginning of period 
t is straightforward: we simply substitute E[X(t)] for X(t) and E[Y{t)] 
for Y{t) in (13) or (14). To get some idea of the fluctuations that we 
can expect, it is useful to compute the variances of X(t). To do this 
we make use of the result that the unconditional variance of X(t+1) is 
related to the conditional variance and expectation of X(t) by means of 
the formula 

Var [X(t+1)] = E[Var X(t+1) | X(t)] + Var [E[X(t+l) | X(t)]] (15) 
Considering enrollments in the freshman class we have from (14) 
E[X^(1)] = p^^ X^(0) + E[Y^(1)] 

m^{2)] - p^^2 x^(o) + p^^ E[Y^(1)] + E[Y^(2)] (16) 
ECX^O)] = p^/ X^(0) + p^/ ELY^(l)] + p^^ E[Y^(2)] + E[Y^(3)] 
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where X^(0) is known and qiven. The variance of X^(l) is obtained from 
Var [X^(l) I X^(0)] = p^^(l - p^^) X^(0) + Var [Y^(l)] 

E [X^(l) I X^(0)] = p^^X^(O) + E [Y^(l)] 

Since E [X^(0)] = X^ (0) and Var [X^(0)] = Var [E [Y^(1)]J = 0, the 
unconditional variance of X^(l) is just equal to our earlier result in 
Equation (7) 

Var CX^(l)] = p^^d - p^^) X^(0) + Var [Y^(l)] . (17a) 

In the next period additional terms enter because 

•Var [X^(2) | X^(l)] = p^^(l - p^^) X^(l) + ^ar [Y^(2)] 

^- [X^(2) I X^(l)] = p^^X^(l) + E [Y^(2)] 

and the sum of the expected value of the former and the variance of the 
latter yields the result 

Var [X^(2)] = p^^^d - p^^) X^(0) + p^^'^l - p^^) X^ (0) 
+ Plld - Pit) E [Y^(1)] + p^^2 

+ Var [Y^(2)] 



In this way one can recursively compute Var [X.(t)]; efficient matrix 

J 

methods for computing them are described by Pollard (1967) and 
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Bartholomew (1967). For long term fc-ecasts, the effect of fluctuations 
due to the initial enrollments becomes small and the dominant terms are 
due to variances in the new admissions of the immediately precedinq years 
and the variances due to uncertainty of continuing students. 

■In making long term forecasts one of the two cases that usually 

interests us corresponds to the assumption that Y.{t) are known exactly; 

3 

with such an assumption, fluctuations in enrollments are entirely due to 
the random nature of attritions and uncertainty in a student's prepress 
once he has enrolled. A second case corresponds to Poisson adnissi-ns; 
in this case enrollment fluctuations are due to the superposition o^ 
random admissions with the random behavior of students once they are in 
the system. 

Pollard (1967) has shown that if the number of new admissions are 
sequences of independent Poisson variables, then the students remaining 
in grade k at the beginning of the next time period are also Poisson 
distributed. Since the total enrollments in any grade are the sum over 
all students who have entered in prior years plus new admissions, the 
total number of enrollments in each grade are also Poisson distributed. 
The usefulness of this result lies in the fact that the variance and 
mean value of the Poisson distribution are equal and that it probably 
represents a realistic estimate of the magnitude of fluctuations in new 
admissions. 
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4. fhr t;rridr^ Pro 'n*s;>iqn Ratio Method 

To illustrate :he m^t/^^wftical structure of the GPR forecastinq 
model we conside-^ fc recall, of the four undergraduate qrades as obtained 
from Equation (2): 



ZJt+l) > 


^11 


0 


0 


0 




Zi(t) 




Yi(t+1) 


Z2(t+1) 






0 


0 


0 




z^Ct) 


+ 


Y2(t+1) 


23(^1) 




0 


a.. 


0 


0 




Z3(t) 




Y3(U1) 


Z4(t+1) 




0 


0 


^34 


0 




Z4(t) 




Y4(t+1) 



In matrix notation. 



Z(t+1) = AZ{t) + Y(t+r) 



(18) 



The coefficient a^^ = d^^ is the fraction of returning freshmen. All 



'11 



other non-zero entries lie below the main diagonal and may be less than, 
equal to or greater than one. By iteration one obtains the (t+1)^^ 
forecast in terms of the new admissions in prior periods and powers of A 



Z(t+1) = A^"^^ Z(0) + Z aJ Y(t - j+1). 

j=0 



(19) 



Since a-j-j = p^^ < 1, it is possible to discuss the asymptotic character 



'11 



of powers of A even though values for a . . , may be greater than one. 
In fact, Equation (19) is identical to the development of forecasts of 



X.(t) in powers of P except that A is substituted for P and Z for X. 
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Ihc (liHoronco lies in t.li(> structuro of A, its rank, and tlio fnU. t.li.ii 
it may not be Dossiblu to noqlect hiqh pov/ers of A, For example, the 
second and third powers of A are given by: 



°11 

^12^11 
^23^12 



0 
0 
0 



^23^24 



0 
0 
0 

0 



0 
0 
0 



-^11 ^ 

2 

^12^11 ° 

^23^12^11 ° 

^23^34^12 ° 



0 
0 
0 
0 



0 
0 
0 
0 



and it is likely that the product 923^34^12 ^^^^^ enough so that new 
freshman admissions of earlier years may dominate all other terms in the 
forecasts for seniors. For large t the first column entries of A* are 



"11 

^12^11 



t-1 



^23^12^11 



t-2 



^23^34^12^11 



t-3 



while all other entries are zero. Hence for large t the typical con- 
tribution to the forecast of grade j is due to a fraction of earlier 
freshman enrollments. 
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Althouqh we do not make any attempt to discuss the statistical 
problem of estiniatino p.. or terms, it has been common practicf^ 

by the State of California to estimate values for a. on the basis 
of recent observations of the random enrollment data, namely, to define 
a time-deoendent ratio 



. . Xi+l(t+l) - Y.^,{t+1) 

a,- ,-+i(t+l) = ^-^ (20) 

' X.(t) 



where X and Y now refer to actual realizations of the enrollments. 
Suppose that Equation (3) represents the underlying stochastic process; 
then the enrollment in grade j+1 at time period t+2 is 

X^„(U2) = Xj(W) pj_j,, . Xj„(W) p.,,_.^, . V.^,(W) . (21) 

If, in making forecasts, we were to use Grade Pro^jressinn Ratios qener- 
ated by (3) and (20), then 



x.{t) 



''j.j+i x.(t) ''j+i'J+i ■ 

3 



(22) 



Generating the sequence of numbers obtained by substituting (22) into 

Zj+l{t+2) = ajj+i(t+l) + Y.^^(t+2) , (23) 
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yields the result 



Zj,i(t+2) = X.(Ul)p.^.^^. 



X.j(t+1) 



x.(t) 



which, thoucjh. similar to Equation (21), leads to substantially different 
values when enrollments X.(t) are time-dependent. 

There are two obvious cases where the sequences generatid by (24) 
and (21) do, in fact, agree. If Pj+i j^.^ terms vanish, then tte on'y 
contribution to Zj^^(t+2) is due to continuing students from a 'owcr qraie 
and new admissions. In this case the sequence {X.(t)) and {Z.(t)} are 
identical . 

Furthermore, if an equilibrium has been reached in the sense that 



Vj(t) 



independently of t 



then the ratio in square brackets in (24) is one and we obtain the simoler 
time-independent recursion 



^j+i 



^- Pjj+i ' ^j+i Pj+i,j+i ' 



(25) 



which agrees with (21) upon deletion of the time parameter t and substitu- 



tion of Z.^^ for X .^^ 



-21- 



III. NjMICa EXAM PLES AND COMP ARISOOMRTRFm 
1- Berkeley Campus Forecasts, 1960-1966 ' 

Table I lists two forecasts and the actual observed enrollments of 
undergraduate students on the Berkeley campus for the period 1962-1966. 
The top entry in each cell is computed by the GYA model with the new 
admission data shown in Table II.. It was assumed that the transition 
probabilities for Fall to Spring and Spring to Fall semesters were those 
given by the (Fall to Spring) and ?^ (Spring to Fall) matrices below: 



Fresh. 



Soph. 



Junior Senior 



.9277 
.0005 

0 

0 



0 

.8612 
.0313 
0 



0 
0 

.9089 
.0047 



0 
0 
0 

.7937 



.0964 
.6990 

0 

0 



0 

.1001 
.7924 
0 



0 

0 

.1393 
.7493 



0 
0 
0 

.2917 



Entries in both of these matrices were estimated from Berkeley campus 
student perfomance and attrition data that has been collected and 
summarized by the Office of Institutional Research for the academic 
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yoiir 1961. The second ontry in each r^l^ is calculated by tho GPR 
method using the ratios 

a^^ = 0.0970 
a^2 = 0.7877 
a23 = 0.8766 
a34 = 0.9944 

published by the State Department of Finance for the year 1960-1961. 

Initial enrollment and admission data for the forecast period are the 

same as those used in the GYA model. The third entry in each cell is 

the actual observed student enrollment. In Table I an italicized 

entry denotes which forecast is closer to the actual enrollment count. 

We should. point out that the interval between GPR forecasts was 

one year (Fall semester to Fall semester), while the forecast period 

for the GYA model was one semester; in Table I we only list the values 

appropriate to the beginning of the Fall semester. This fact in com- 

bination wi th the intrinsically larger forecast variances of the GPR 

model for a- ..-» > 1 would seem to account for the discrepancies in 
J > J"*" • 

the junior year. 

In applying data to these models we have used the following 
convention: a new student is one coming to the U.C. Berkeley campus 
for the first time; in other words, he has not been registered before. 
A continuing student is one who has, with the exception of the summer 
sessions, a record of continuous registrations. For example, a student 
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reqistered in the Sprinq semester of 1963 is a continuing student if he? 
retjisters in fall 1^)63. A returpimj student is a student who was onco 
registered at the campus but has left for one or more semesters. 

In calculating the diagonal entries of and P2 we divided the 
number of continuing and returning students in one semester by enroll- 
ments of the previous semester. Clearly, from the definitions given 
above it would be oetter to relate returning students to enrollments of 
a semester two or more periods in the past. 
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